58 research outputs found
Product recognition in store shelves as a sub-graph isomorphism problem
The arrangement of products in store shelves is carefully planned to maximize
sales and keep customers happy. However, verifying compliance of real shelves
to the ideal layout is a costly task routinely performed by the store
personnel. In this paper, we propose a computer vision pipeline to recognize
products on shelves and verify compliance to the planned layout. We deploy
local invariant features together with a novel formulation of the product
recognition problem as a sub-graph isomorphism between the items appearing in
the given image and the ideal layout. This allows for auto-localizing the given
image within the aisle or store and improving recognition dramatically.Comment: Slightly extended version of the paper accepted at ICIAP 2017. More
information @project_page -->
http://vision.disi.unibo.it/index.php?option=com_content&view=article&id=111&catid=7
Zero-Annotation Object Detection with Web Knowledge Transfer
Object detection is one of the major problems in computer vision, and has
been extensively studied. Most of the existing detection works rely on
labor-intensive supervision, such as ground truth bounding boxes of objects or
at least image-level annotations. On the contrary, we propose an object
detection method that does not require any form of human annotation on target
tasks, by exploiting freely available web images. In order to facilitate
effective knowledge transfer from web images, we introduce a multi-instance
multi-label domain adaption learning framework with two key innovations. First
of all, we propose an instance-level adversarial domain adaptation network with
attention on foreground objects to transfer the object appearances from web
domain to target domain. Second, to preserve the class-specific semantic
structure of transferred object features, we propose a simultaneous transfer
mechanism to transfer the supervision across domains through pseudo strong
label generation. With our end-to-end framework that simultaneously learns a
weakly supervised detector and transfers knowledge across domains, we achieved
significant improvements over baseline methods on the benchmark datasets.Comment: Accepted in ECCV 201
Loss Guided Activation for Action Recognition in Still Images
One significant problem of deep-learning based human action recognition is
that it can be easily misled by the presence of irrelevant objects or
backgrounds. Existing methods commonly address this problem by employing
bounding boxes on the target humans as part of the input, in both training and
testing stages. This requirement of bounding boxes as part of the input is
needed to enable the methods to ignore irrelevant contexts and extract only
human features. However, we consider this solution is inefficient, since the
bounding boxes might not be available. Hence, instead of using a person
bounding box as an input, we introduce a human-mask loss to automatically guide
the activations of the feature maps to the target human who is performing the
action, and hence suppress the activations of misleading contexts. We propose a
multi-task deep learning method that jointly predicts the human action class
and human location heatmap. Extensive experiments demonstrate our approach is
more robust compared to the baseline methods under the presence of irrelevant
misleading contexts. Our method achieves 94.06\% and 40.65\% (in terms of mAP)
on Stanford40 and MPII dataset respectively, which are 3.14\% and 12.6\%
relative improvements over the best results reported in the literature, and
thus set new state-of-the-art results. Additionally, unlike some existing
methods, we eliminate the requirement of using a person bounding box as an
input during testing.Comment: Accepted to appear in ACCV 201
Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection
Object detection has witnessed significant progress by relying on large,
manually annotated datasets. Annotating such datasets is highly time consuming
and expensive, which motivates the development of weakly supervised and
few-shot object detection methods. However, these methods largely underperform
with respect to their strongly supervised counterpart, as weak training signals
\emph{often} result in partial or oversized detections. Towards solving this
problem we introduce, for the first time, an online annotation module (OAM)
that learns to generate a many-shot set of \emph{reliable} annotations from a
larger volume of weakly labelled images. Our OAM can be jointly trained with
any fully supervised two-stage object detection method, providing additional
training annotations on the fly. This results in a fully end-to-end strategy
that only requires a low-shot set of fully annotated images. The integration of
the OAM with Fast(er) R-CNN improves their performance by mAP,
AP50 on PASCAL VOC 2007 and MS-COCO benchmarks, and significantly outperforms
competing methods using mixed supervision.Comment: Accepted at ECCV 2020. Camera-ready version and Appendice
Weakly Supervised Semantic Segmentation Using Constrained Dominant Sets
The availability of large-scale data sets is an essential pre-requisite for
deep learning based semantic segmentation schemes. Since obtaining pixel-level
labels is extremely expensive, supervising deep semantic segmentation networks
using low-cost weak annotations has been an attractive research problem in
recent years. In this work, we explore the potential of Constrained Dominant
Sets (CDS) for generating multi-labeled full mask predictions to train a fully
convolutional network (FCN) for semantic segmentation. Our experimental results
show that using CDS's yields higher-quality mask predictions compared to
methods that have been adopted in the literature for the same purpose
LifeCLEF 2016: Multimedia Life Species Identification Challenges
International audienceUsing multimedia identification tools is considered as one of the most promising solutions to help bridge the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g., iSpot, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipment have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching real world requirements. The LifeCLEF lab proposes to evaluate these challenges around 3 tasks related to multimedia information retrieval and fine-grained classification problems in 3 domains. Each task is based on large volumes of real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders to reflect realistic usage scenarios. For each task, we report the methodology, the data sets as well as the results and the main outcom
Vehicle Detection Using Alex Net and Faster R-CNN Deep Learning Models: A Comparative Study
This paper has been presented at : 5th International Visual Informatics Conference (IVIC 2017)This paper presents a comparative study of two deep learning models used here for vehicle detection. Alex Net and Faster R-CNN are compared with the analysis of an urban video sequence. Several tests were carried to evaluate the quality of detections, failure rates and times employed to complete the detection task. The results allow to obtain important conclusions regarding the architectures and strategies used for implementing such network for the task of video detection, encouraging future research in this topic.S.A. Velastin is grateful to funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509) and Banco Santander. The authors wish to thank Dr. Fei Yin for the code for metrics employed for evaluations. Finally, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research. The data and code used for this work is available upon request from the authors
LifeCLEF 2015: Multimedia Life Species Identification Challenges
International audienceUsing multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipments have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching the real world’s requirements. The LifeCLEF lab proposes to evaluate these challenges around three tasks related to multimedia information retrieval and fine-grained classification problems in three living worlds. Each task is based on large and real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders in order to reflect realistic usage scenarios. This paper presents more particularly the 2014 edition of LifeCLEF, i.e. the pilot one. For each of the three tasks, we report the methodology and the datasets as well as the official results and the main outcomes
Early esophageal adenocarcinoma detection using deep learning methods
Purpose This study aims to adapt and evaluate the performance of different state-of-the-art deep learning object detection methods to automatically identify esophageal adenocarcinoma (EAC) regions from high-definition white light endoscopy (HD-WLE) images.
Method Several state-of-the-art object detection methods using Convolutional Neural Networks (CNNs) were adapted to automatically detect abnormal regions in the esophagus HD-WLE images, utilizing VGG’16 as the backbone architecture for feature extraction. Those methods are Regional-based Convolutional Neural Network (R-CNN), Fast R-CNN, Faster R-CNN and Single-Shot Multibox Detector (SSD). For the evaluation of the different methods, 100 images from 39 patients that have been manually annotated by five experienced clinicians as ground truth have been tested.
Results Experimental results illustrate that the SSD and Faster R-CNN networks show promising results, and the SSD outperforms other methods achieving a sensitivity of 0.96, specificity of 0.92 and F-measure of 0.94. Additionally, the Average Recall Rate of the Faster R-CNN in locating the EAC region accurately is 0.83.
Conclusion In this paper, recent deep learning object detection methods are adapted to detect esophageal abnormalities automatically. The evaluation of the methods proved its ability to locate abnormal regions in the esophagus from endoscopic images. The automatic detection is a crucial step that may help early detection and treatment of EAC and also can improve automatic tumor segmentation to monitor its growth and treatment outcome
Detecting People in Artwork with CNNs
CNNs have massively improved performance in object detection in photographs.
However research into object detection in artwork remains limited. We show
state-of-the-art performance on a challenging dataset, People-Art, which
contains people from photos, cartoons and 41 different artwork movements. We
achieve this high performance by fine-tuning a CNN for this task, thus also
demonstrating that training CNNs on photos results in overfitting for photos:
only the first three or four layers transfer from photos to artwork. Although
the CNN's performance is the highest yet, it remains less than 60\% AP,
suggesting further work is needed for the cross-depiction problem. The final
publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-46604-0_57Comment: 14 pages, plus 3 pages of references; 7 figures in ECCV 2016
Workshop
- …